A Survey on Natural Language Text Copy Detection

نویسندگان

  • BAO Jun-Peng
  • SHEN Jun-Yi
  • LIU Xiao-Dong
  • SONG Qin-Bao
چکیده

Copy detection has very important application in both intellectual property protection and information retrieval. Currently, copy detection concentrates on document copy detection mainly. In early days, document copy detection concentrated on program plagiarism detection mainly and now the most studies are on text copy detection. In this paper, a comprehensive survey on natural language text copy detection is given, the developments of copy detection is introduced. The approaches and features of a variety of existing text copy detection systems or prototypes are reviewed in detail. Then some key detection techniques are listed and compared with each other. In the end, the future trend of text copy detection is discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Survey Paper on Data Mining Techniques: Outlier Detection and Text summarization

IJSER © 2014 http://www.ijser.org Abstract—In this paper, we will discuss all the researches we have find till. And what we have concluded from that survey. We try to compare and combine two subjects that are Natural language Processing and Data Mining. We will work on Outlier Detection and Text summarization. Data mining is the process of extraction of data that would be of any kind and Outlie...

متن کامل

Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting

With due respect to the authors’ rights, plagiarism detection, is one of the critical problems in the field of text-mining that many researchers are interested in. This issue is considered as a serious one in high academic institutions. There exist language-free tools which do not yield any reliable results since the special features of every language are ignored in them. Considering the paucit...

متن کامل

Design a Persian Automated Plagiarism Detector (AMZPPD)

Currently there are lots of plagiarism detection approaches. But few of them implemented and adapted for Persian languages. In this paper, our work on designing and implementation of a plagiarism detection system based on preprocessing and NLP technics will be described. And the results of testing on a corpus will be presented. Keywords— External Plagiarism, Plagiarism, Copy detection, natural ...

متن کامل

Approaches for Source Retrieval and Text Alignment of Plagiarism Detection Notebook for PAN at CLEF 2013

In this paper, we describe our approach at the PAN@CLEF2013 plagiarism detection competition. In sub-task of Source Retrieval, a method combined TF-IDF, PatTree and Weighted TF-IDF to extract the keywords of suspicious documents as queries to retrieve the plagiarism source document is proposed. In sub-task of Text Alignment, a method based on sentence similarity is presented. Our text alignment...

متن کامل

Spell Checker for Non Word Error Detection: Survey

Spell checker is a software tool which is used to detect the spelling errors in a text document. A spell checker can also provide suggestions to correct the misspellings. The error can be either non word error or real word error. Detecting real word error is really difficult task and requires advanced statistical and Natural Language Processing (NLP) techniques. Currently we have many methods f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003